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0^ ' of density functional theory in which the ground state energy is determined 



tial calculations is described in detail. The method is based on a formulation 



by minimization with respect to the density matrix, subject to the condition 
that the eigenvalues of the latter lie in the range [0,1]. Linear-scaling behav- 
ior is achieved by requiring that the density matrix should vanish when the 
^ ' separation of its arguments exceeds a chosen cutoff. The limitation on the 



eigenvalue range is imposed by the method of Li, Nunes and Vanderbilt. The 
scheme is implemented by calculating all terms in the energy on a uniform 
real-space grid, and minimization is performed using the conjugate-gradient 
method. Tests on a 512-atom Si system show that the total energy converges 
rapidly as the range of the density matrix is increased. A discussion of the re- 
lation between the present method and other linear-scaling methods is given, 

and some problems that still require solution are indicated. 
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I. INTRODUCTION 



During the last decade, first-principles total-energy methods based on density functional 
theory (DFT) combined with the pseudopotential method, have become established as a 
major tool in the study of condensed matter [|lj. The DFT-pseudopotential approach is now 
widely used for both static and dynamic simulations on an enormous range of condensed- 
matter problems. However, these methods suffer from a severe drawback in that their 
computational cost generally increases as the cube of the number of atoms in the system. 
This unfavorable scaling limits the size of systems that can be studied with current methods 
and today's computers to a few hundred atoms at most. This 0(N 3 ) scaling appears in spite 
of the fact that the complexity of the problem increases only linearly with the system size. 
This observation suggests that the unfavorable scaling of current methods is a consequence 
of the way in which the electronic structure problem is being addressed. Conventional 
methods rely either on diagonalization of the Hamiltonian or orthonormalization of a set of 
occupied orbitals, both of which are intrinsically 0(N 3 ) operations. It is clear that more 
efficient methods in which the effort is proportional to the number of atoms must be possible, 
and in recent years a considerable effort has been devoted to finding such 'linear-scaling' 
schemes P-p~3|j. 

The earliest linear-scaling scheme appears to be the 'divide and conquer' method of 
Yang |fj|. This obtains the electronic density and hence the total energy by dividing the 
system into overlapping sub-systems that can be treated independently. The density is 
calculated for each sub-system with conventional LCAO-DFT. The Hamiltonian for each 
sub-system, which includes the potential due to the other sub-systems, is diagonalized in- 
dependently, thus avoiding the need to diagonalize the full Hamiltonian. This procedure is 
repeated until self-consistency is achieved. Baroni and Giannozzi |3j also proposed a scheme 
that directly determines the electron density. They do this by discretizing the Kohn-Sham 
Hamiltonian on a real-space grid, and then using the recursion method of Haydock, Heine 
and Kelly [fnj to obtain the diagonal elements of the Green's function, from which the 
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electron density can be computed by contour integration. In this case linear scaling results 
from the fact that the continued fraction used to evaluate a particular diagonal element of 
the Green's function can be truncated once a certain neighborhood of each point has been 
explored. This neighborhood is independent of the system size for sufficiently large systems. 

More recently, several new schemes that resemble traditional first-principles methods 
have been reported. Galli and Parrinello [|J pointed out that some improvement could be 
achieved in the scaling of a conventional DFT calculation by requiring spatial localization 
of the electronic orbitals. This localization was achieved by adding certain non-local con- 
straining terms to the Hamiltonian, or by using a filtering procedure. The total energy can 
then be obtained as a functional of the localized orbitals and their conjugate orbitals 
\4>i) = J2j Sj^l^j), but in order to obtain these conjugate orbitals, the overlap matrix S has 
to be inverted. Since spatial localization implies sparsity of S, this can be achieved in 0(N 2 ) 
operations, so that some improvement with respect to 0(N 3 ) is obtained. A step further 
in this direction was made independently by Mauri, Galli and Car (hereafter referred 
to as MGC) and by Ordejon et al. They introduced a new functional of the occupied 

orbitals that possesses the same ground state as the conventional energy functional, but 
with the added advantage of leading naturally to orthogonal orbitals when minimized. If 
this new functional is minimized with respect to orbitals that are constrained to remain 
localized in chosen regions of space, as suggested by Galli and Parrinello a linear scaling 
method results. In the original formulation, the number of orbitals entering the new func- 
tional is equal to half the number of electrons in the system. This restriction seems to lead 
to very slow convergence, and to the appearance of spurious local minima in the functional. 
This problem has been recently overcome by Kim, Mauri and Galli 0, by generalizing the 
functional so that it depends on an arbitrary number of orbitals. 

The linear-scaling scheme most relevant to the present work is that put forward by Li, 
Nunes and Vanderbilt [jnj (hereafter referred to as LNV) in the context of tight-binding 
semi-empirical calculations. In this method, linear scaling is achieved by taking advantage 
of the real space localization properties of the density matrix, p(r, r') . By introducing a 
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spatial cutoff R c in p, such that p(r, r') is set to zero if |r — r'| > R c , the number of non-zero 
elements in p increases only linearly with the system size. The electronic structure problem 
is then formulated as a minimization of the total energy with respect to the truncated density 
matrix, subject to the constraints of idempotency (p 2 = p) and correct trace (2Trp = N e , 
where N e is the number of electrons). The scheme of LNV consists of an algorithm for 
imposing these constraints that at the same time fulfils the goal of linear scaling. The 
idempotency of p is the most difficult constraint to impose, and this scheme achieves it by 
expressing p in terms of an auxiliary matrix, which we denote in this paper by a. This is 
subjected to a purifying transformation due to McWeeny ||15|| . If a is a near-idempotent 
matrix, i.e. if its eigenvalues lie close to or 1, this transformation will return p as a more 
nearly idempotent matrix, and thus it is possible to minimize the total energy with respect 
to er while ensuring the near idempotency of p. By construction, the method is variational 
(i.e. min E(R C ) > min.E'(oo)), and it has been shown that the convergence of calculated 
properties with the parameter R c is fairly rapid [|10],|rTJ. It is now being widely used in 
tight-binding simulations of large systems. 

Recently, the idea of working with the density matrix has been applied to DFT linear 
scaling schemes. This has been done independently by Hierse and Stechel [O and by 



Hernandez and Gillan |13| . In both cases, the density matrix is represented in real space as: 



p(v,r')=Y,Mr)K^ ( j ) ;( r '), (1) 

a/3 

where the <p a are a set of localized functions, and K a p is a symmetric matrix. The total 

energy is expressed in terms of p(r, r'), and minimization is carried out with respect to 



both the (j) a and the K a p. Hierse and Stechel [|T2| use a number of functions (j) a equal to 
the number of occupied orbitals, but this restriction is not present in our scheme. The 
consequences of this and other differences between the two methods will be addressed later 
in this paper. 

Previously, only a brief description of our method has been published [FTS] . In this paper 
we give a detailed description of the method, together with some illustrations of its practical 



performance and a discussion of its relation to other methods. In section 2, the method 
is outlined and its theoretical foundations are discussed. The practical implementation 
of the method is then described in section 3. The tests we have performed to probe the 
practical usefulness of the scheme are presented in section 4. In section 5, we assess what 
has been achieved and we discuss possible future developments, with particular attention to 
the problems that need to be overcome before the method can be generally applied. Some 
of the mathematical analysis is reported in an Appendix. 



II. FORMULATION OF DFT IN TERMS OF THE DENSITY MATRIX 

A. Density functional theory 

We need to recall briefly the principles of DFT [|l(J. The total energy E tot of the system 
of valence electrons and atomic cores is expressed as: 

-^tot = E K + E ps + E R + E xc + E M , (2) 

where the terms on the right are the kinetic, pseudopotential, Hartree and exchange- 
correlation energies of the electrons, and Em is the Madelung energy of the cores. The 
first two energies are: 

N h 2 

E K = 2J2(A\ -t^V 2 |^> 

N 

E ps = 2Y,mv ps m , (3) 
i=i 

where ipi are the Kohn-Sham (KS) orbitals, V ps is the total pseudopotential operator, and 
N = \N e is the number of occupied orbitals. The energies En and E xc can be written in 
terms of the electron number density n(r): 

En — J drdr' n(r)n(r') / \r — r'| 

E xc = J drn(r)e xc (n(r)) , (4) 
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where for simplicity we assume the local density approximation (LDA) for E xc , with e xc the 
exchange- correlation energy per electron. The number density is: 

JV 

n(r) = 2£hMr)| 2 . (5) 

i=i 

The important principle for present purposes is that the true ground-state energy and elec- 
tron density are obtained by minimizing E tot with respect to the KS orbitals, subject to the 
constraint that the latter are kept orthonormal. 

In the standard formulation of DFT, which we have just summarized, all the occupied 
orbitals are fully occupied. However, it is frequently convenient, for physical, computational 
or formal reasons, to generalize the theory so that orbitals can be partially occupied. Spatial 
orbital i^i(r), rather than containing 2 electrons, may now contain 2/j electrons, where the 
occupation number lies in the range < fi < 1. The number density n(r) now becomes: 

n(r)=2£/^(r)| 2 , (6) 

i 

and the kinetic and pseudopotential energies are: 

i 2m 

E ps = 2j2mi\v PS m . (7) 

i 

The expressions for E^ and E xc in terms of n(r) are unchanged. 

The usual physical reason for making this generalization is that one wishes to treat 
the electrons at a non-zero temperature, in which case the fi are Fermi-Dirac occupation 
numbers ]l7|; computationally, the generalization is sometimes made in order to get rid of 



the troublesome discontinuity at the Fermi level in metallic systems [0,0. Our reason for 
considering it here is that it will be relevant to the density matrix formulation. We shall 
assume that if E tot is minimized both with respect to the ipi (subject to orthonormality) 
and with respect to the fi (subject to the restriction < fi < 1 and the condition that the 
sum fi be equal to |iV e ), then we arrive at exactly the ground state that is obtained by the 
more usual minimization with respect to fully occupied states ipi. Another way of putting 



this is that the energy cannot be reduced below the normal ground state by allowing partial 
occupation. 

Now we turn to the density matrix, which is defined by 

p(r,r')=E/^( r M( r ') ■ (8) 

i 

It follows from this definition that p(r, r') is a Hermitian operator whose eigenvalues are all in 
the interval [0, 1]. The converse is also true: a Hermitian operator p(r, r') whose eigenvalues 
are fi and whose eigenf unctions are ipi(r) can be written as in equation In terms of 
such an operator p(r, r'), let the kinetic energy, pseudopotential energy and number density 
be defined as: 

Ek = ~- fdr (V r 2 p(r,r')) , 

m J V /r=r' 

E ps = 2j drdr'y ps (r',r)p(r,r') 

n(r) = 2p(r,r) , (9) 

with En and E xc expressed in the usual way in terms of n(r). It follows from what we have 
said before that if E tot is minimized with respect to p(r, r') subject to the condition that 
the eigenvalues of the latter are in the required interval and add up to \N e , then we arrive 
at the usual ground state. This is the density matrix formulation of DFT. 

B. Localization of the density matrix 

Since DFT is variational, any restriction placed on the class of density matrices p(r, r') 
that can be searched over has the effect of raising the minimum energy E m \ n above its true 
ground-state value E ; progressive relaxation of such a restriction makes E m[n tend to E . 
Now in general the density matrix in the true ground state tends to zero as the separation 
of its arguments |r — r'| increases. This strongly suggests the usefulness of estimating E by 
searching over p(r, r') with the restriction that: 

p(r,r') = 0, |r-r'| > R c , (10) 
7 



where R c is a chosen cutoff radius. The resulting estimate E m i n (R c ) will tend to E from 
above as R c — > oo. The manner in which p(r, r') goes to zero at large separations depends 
on the electronic structure of the system, and particularly on whether there is a gap between 
the highest occupied and lowest unoccupied states. It is rigourously established that in one- 
dimensional systems having a gap p decays exponentially with separation, while in gapless 
systems it decays only as an inverse power ||20|| . It is presumed that three-dimensional 
systems behave similarly. This suggests - though to our knowledge it is unproven - that 
E m i n (R c ) — > Eq exponentially for insulators and algebraically for metals. 

Clearly in practical calculations we cannot work directly with a six-dimensional function 
p(r,r'), even if it vanishes beyond a chosen radius. It is essential that p be separable, i.e. 
representable in the form: 



For practical purposes, there must be only a finite number of <f) a (r) functions, which will be 
referred to as support functions. For p to be Hermitian, we must require that the matrix 
K a p be Hermitian. The restriction to a finite number of support functions is equivalent to 
the condition that p have only this number of non-zero eigenvalues, and this is the essence 
of the separability requirement. With this, we now have two independent restrictions on p: 
localization and separability. The localization of p can be imposed by requiring that the 
support functions be non-zero only within chosen regions, which we call the support regions, 
and that the coefficients K a p vanish if the separation of the support regions of Q and 4>p 
exceeds a chosen cutoff. 

We now have a general framework for linear-scaling DFT schemes. In practical calcula- 
tions, the <f) a functions will be represented either as a linear combination of basis functions, 
or simply by numerical values on a grid. Either way, the amount of information contained in 
a support function will be independent of the size of the system. The amount of information 
in the support functions will then scale linearly with the size of the system, and the number 
of K a /3 coefficients will scale in the same way. This in turn implies that the electron density 




(11) 
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n(r) and all the terms in the total energy can be calculated in a number of operations which 
scales linearly with system size. 



C. Eigenvalue range of the density matrix 

In this general scheme, the ground state is determined by searching over support functions 
and K a p matrices. However, it is essential that this search be confined to those <p a and K a @ 
for which the eigenvalues of p(r, r') lie in the interval [0,1]. This is a troublesome condition 
to impose, because we certainly do not wish to work directly with these eigenvalues. We can 
achieve what we want by expressing p in a form that satisfies the condition automatically. 

The scheme developed in this paper is the DFT analogue of the tight-binding scheme of 



LNV pOfl . We write the density matrix as: 

p = 3a * a — 2a * a * a , (12) 

where <r(r, r') is an auxiliary function. (The asterix here indicates the continuum analogue 
of matrix multiplication. For arbitrary two-point functions A(r, r') and B(r, r'), we use the 
notation C = A * B as a short-hand for the statement: 

C(r, r') = J dr" A(r, r")B(r", r') .) (13) 

The reason this works is that the eigenvalues X p of p automatically satisfy < X p < 1 
provided the eigenvalues \ a of a are in the range — | < A CT < |; in addition, A p has turning 
points at the values and 1. Since the ground state is obtained when X p = or 1, there is 
a natural mechanism whereby variation of a drives p towards idempotency. 
To obtain the separable form of p (Eq. ([TT|) ), we write: 

( r(r,r')=E^( r )^/3( r ') ) ( 14 ) 

a/3 

which implies the matrix relation: 

K = 3LSL - 2LSLSL , (15) 



where S a p is the overlap matrix of support functions: 

S af} = J dT(j> a {v)(j>p{T) . (16) 

The ground state is now obtained by minimizing E tot with respect to the (f) a and the L a p 
matrix, with the K a p matrix given by Eq. (|T3[). In the practical calculations reported later, 
the (p a are non-zero only inside spherical regions of radius -R reg , and the L a p are non-zero 
only if the centres of the regions a and (3 are separated by less than a cutoff distance i?L- 

It will be useful for the purposes of later discussion to note how a closely related scheme 
leads back to the MGC method ||. This scheme is obtained by writing: 

p = a*(2-a), (17) 

where a is required to be positive semi-definite. Since the eigenvalues A CT can be expressed 
as n 2 a where n a is real, the eigenvalues of p are given by: 

\ p = X a (2-\ r )=4(2-4). (18) 

This quartic function lies in the range [0,1] for < 2 1//2 and has turning points when 
X p = and 1. This give an alternative mechanism for driving p towards idempotency. With 
a given, as before, by Eq. (|HD, it is straightforward to show that a is positive semi-definite 
if and only if the matrix L a p is positive semi-definite, and this is equivalent to the condition 
that L a p be expressible as: 

^ = E^H S) - (19) 

s 

The result is that cr(r, r') must have the form: 

C T(r,r') = Ex (s) (r)x (s) (r'), (20) 

s 

where: 

x M(r)=5>W0 a (r). (21) 

a 

Following arguments presented by Nunes and Vanderbilt J21J in the tight-binding context, it 



can now be shown that this scheme is exactly equivalent to the linear-scaling DFT scheme 
of MGC. 
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III. PRACTICAL IMPLEMENTATION OF THE METHOD 



A. The real-space grid 

We now give a prescription for the calculation of the energy functional, and of its deriva- 
tives with respect to the support functions <p a and the L a p parameters, and we describe how 
minimization of the energy can be carried out in practice. Central to our implementation 
of the method described in the previous section is the use of a regular cubic real-space grid, 
spanning the whole system under study. There have been a number of recent implementa- 
tions of conventional DFT-pseudopotential calculations using real-space grids [^2| -|25||. 

The support functions are represented by their values at the grid points. Since these 
functions are required to be spatially localized, they have non-zero values only on the grid 
points inside the localization regions. In the present work, these regions are chosen to be 
spherical, and their centres are at the atomic positions. Real-space integration is replaced 
by summation over grid points, so that e.g. the overlap matrix elements are calculated as: 

Sap ~ 5ujJ2<f) a (r e ) (p^Yf) , (22) 
re 

where the sum goes over the set of grid points common to the localization regions of both 
(p a and (pp, and 8uj is the volume per grid point. 

The action of the kinetic energy operator on the support functions is evaluated using a 
finite difference technique. To nth order in the grid spacing, h, we have that 

d 2 4> a 1 - 

-— -(n x ,n y ,n z ) ~ — ^ C\ m \<f) a {n x + m,n y ,n z ), (23) 
Ox a m= _ n 

where n x ,n y and n z are integer indices labelling grid point r^, and the coefficients C\ m \ can 
be calculated beforehand. Equivalent expressions can be used for d 2 4> a /dy 2 and d 2 (p a /dz 2 , 
and it is thus possible to evaluate V 2 a approximately at each grid point. From Eqs. (|9|) 
and fli~l|) , the kinetic energy is given by: 

E K = 2Y,K«pT fla , (24) 

a/3 
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where 

Tp a = ~ J dv<P P {v)V 2 r <p a {v) . (25) 

Once V 2 a (r) has been evaluated at each grid point using Eq. (|23|), the T a p matrix elements 
are calculated by summing over grid points, just as for S a p (see Eq. ([22])). 

In order to evaluate the exchange-correlation, Hartree and pseudopotential contributions 
to the total energy, we first need to evaluate the electron density at each grid point. From 
Eqs. (H) and (|Tl[) , the density at grid point r £ is: 



n(r t ) = 2j2<f ) a(re)K a/3 (j) f3 (r £ ) . (26) 

a/3 

From this, it is straightforward to evaluate the exchange-correlation energy by summing the 
quantity n(r ()t xc [n(r ()] over grid points. The exchange-correlation potential fi xc can also be 
calculated at each point, and is given as 

Hxcfa) = -^{n(r e )e xc [n(r e )]} . (27) 

To obtain the Hartree energy and potential we use the fast Fourier transform (FFT) 
method to transform the calculated electronic density into reciprocal space, thus obtaining 
its Fourier components uq. The Hartree energy is then given as 

E H = 2vrfie 2 I^gWG 2 , (28) 

G^O 

where Q is the volume of the simulation cell. The Hartree potential in reciprocal space is: 

Vff(G) = 4irtte 2 n G /G 2 . (29) 

This can be constructed on the reciprocal-space grid, and transformed to obtain the Hartree 
potential in real space. FFT is, of course, an 0(Nlog 2 N) operation rather than an O(N) 
operation, but the difference is negligible for present purposes. 

We restrict ourselves here to local pseudopotentials, so that the value of the total pseu- 
dopotential Vp S (r^) at grid point is formally given by: 

12 



V ps (r e )=Y,v P s(\re-Ri\) , (30) 
i 

where v ps (r) is the ionic pseudopotential and R/ is the position of ion /. In practice, however, 
V ps (r£) cannot be calculated like this, because v ps (r) has a Coulomb tail —Z\e\ 2 /r at large r, 
where Z is the core charge. In order to obtain a linear-scaling algorithm for E ps , we proceed 
as follows. The ionic pseudopotential is represented as the sum of the Coulomb potential 
due to a Gaussian charge distribution r/(r) and a short-range potential Vp S (r). The total 
charge in r](r) is Z\e\, and the distribution is given by: 

r](r) = Z\e\(a/7r) 3/2 exp(-ar 2 ) , (31) 

where the parameter a governs the rate of decay of the Gaussian. We therefore have: 

v ps (r) = -^erf(« 1/2 r) + v° ps (r) . (32) 

The part of V ps coming from v® can now be calculated as a direct sum over ions, as in 
Eq. (|30|). Since v ps can be neglected beyond a certain radius, this part of the calculation 
scales linearly. The part of V ps coming from the array of Gaussians can be treated in exactly 
the same way as the Hartree potential. The pseudopotential energy is then calculated by 
summation over the real-space grid: 

E ps = 5uj^2 V ps (re)n(rt) . (33) 

e 

B. Derivatives and minimization 

Once the contributions to the total energy have been obtained as outlined above, we 
need to vary both L a p and <ft a in order to minimize it. The L a p and 4> a are independent 
variables, and the problem breaks naturally into two separate minimizations that can be 
carried out in an alternating manner: one with respect to L a p with fixed 4> a , and the other 
with respect to <fi a with fixed L a p. Indeed, the choice of object function can be different for 
the two types of variation, and when minimizing with respect to the L Q/ g we find it more 
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convenient to take Q = E to t — as our object function, where /i is the chemical potential 
and N e is the electron number. We return to this point below. 

Expressions for the derivatives with respect to L a p and (f) a are obtained in Appendix A. 
The partial derivative of Q with respect to L a p is given by 
dfl 

— — = [G(SLH' + H'LS) - 4(SLSLH' + SLH'LS + H'LSLS)] a0 (34) 

where H' = H — fiS, and H is the matrix representation of the KS Hamiltonian in the 
support function representation. It is worth noting that this expression is exactly the same as 



would be obtained in a non-orthogonal tight-binding formalism [ 2B| . There is, however, one 
important difference: in self-consistent DFT calculations the Hamiltonian matrix elements 
depend on L a p through the electronic density n(r). The partial derivative of the total energy 
with respect to 4> a at grid point 17 is given by 

— = A5ujJ2 \K a pH + 3(LHL) a p - 2(LSLHL + LHLSL) a p] <f> p (r t ), (35) 

where H is the Kohn-Sham operator, which is made to act on support function (pp. 

It is important to notice that because of the spatial localization of the support functions, 
and the finite range of L, all the matrices involved in the calculation of these derivatives 
are sparse, when the system is large enough. Provided this sparsity is exploited in the 
computational scheme, the method scales linearly with the size of the system. 

In the scheme of LNV [0 , it is proposed to work at constant chemical potential, rather 



than at constant electron number. We prefer to maintain the electron number constant. 
The variations with respect to L a p and 4> a will in general cause the electron number to differ 
from the correct value, and it is therefore necessary to correct this effect as the minimization 
proceeds. We achieve this in the following manner: during the minimization with respect 
to L, the current search direction is projected so that it is tangential to the local surface 
of constant N e , i.e. perpendicular to V^A^e at the current position. This ensures that the 
minimization along this direction will cause only a small change in N e , and it is expected 
that at the new minimum N e will differ only slightly from the required value. In any case, it 
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is possible to return to a position as close as desired to the constant N e surface by following 
the local gradient V ' lN g . If the value of the chemical potential \x is appropriately chosen, this 
correction step can be carried out without losing the reduction in Q obtained by performing 
the line minimization, and this is why we prefer to take fl as the object function instead of 
the total energy, when minimizing with respect to L. We find that this scheme is capable 
of maintaining the electron number close to its correct value throughout the minimization, 
and is also simple to implement. The gradient ViN e has elements 

dN 

— - = 12(SLS - SLSLSU , (36) 

which, as all other gradients discussed earlier, can be calculated in O(N) operations. Min- 
imization with respect to (j) a (re) will also have the effect of changing the electron number. 
However, given that the two types of variation are performed alternately, the correction 
during the L minimization is sufficient to counteract this effect. 

Given that variation of L a p causes the electronic density to change, and this in turn 
implies that the Hamiltonian matrix elements change, it would seem necessary to update 
the Hamiltonian at each step of the minimization with respect to L. However, we find that 
this can be avoided by considering H fixed during this part of the minimization. Strictly 
speaking, if H is held fixed while L is varied, we are not minimizing Q = E tot — fiN but 
rather Q' = E' — fiN, where E' is given by 

E' = Tr[(QLSL - ALSLSL)H] . (37) 

If this minimization were carried out through to convergence, this would be equivalent to 
diagonalizing H in the representation of the current support functions. At convergence, 
it will be found in general that L and H are not mutually consistent, and if consistency 
is required, one needs to update H and repeat the minimization, iterating this cycle until 
consistency was achieved. This is not necessary in practice, because H will be updated 
at the next variation with respect to the support functions. The minimization of Q' has 
practical advantages in that it avoids the updating of the Hamiltonian at each step, and, 
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because of its construction, it is a cubic polynomial in every possible search direction, so it 
is possible to find the exact location of line minima during its minimization. 

The minimization with respect to 4> a {^e) can be carried out by simply moving along the 
gradient dEt ot /d<f) a (rg) Eq. ( pop (steepest descents) or by using this expression to construct 
mutually conjugate directions (conjugate gradients). 



IV. TEST CALCULATIONS 

In order to test our O(N) DFT scheme, we have performed calculations on a system of 
512 Si atoms treated using a local pseudopotential. The purpose of these tests is to find out 
how the total energy depends on the two spatial cutoff radii: the support-region radius -R reg , 
and the L- matrix cutoff radius R^. The practical usefulness of the scheme, and the size of 
system for which linear-scaling behavior is attained depend on the rate of convergence of 
E tot to its exact value as R reg and are increased. Here, 'exact' refers only to the absence 
of errors due to the truncation of p(r, r'); other sources of inexactness, such as the use of a 
discrete grid and a local pseudopotential, are of no concern here. 

The system treated is a periodically repeating cell containing 512 atoms of diamond- 
structure Si having the experimental lattice parameter (5.43 A). The local pseudopotential 
is the one constructed by Appelbaum and Hamann |27| , which is known to give a satisfactory 
representation of the self-consistent band structure. The LDA exchange-correlation energy 
is calculated using the Ceperley- Alder formula f2"8R . We use a grid spacing of 0.34 A, which 



is similar to the spacing typically used in pseudopotential plane-wave calculations on Si, 
and is sufficient to give reasonable accuracy. The second derivatives of the 4> a needed in the 
calculation of Ek are computed using the second-order formula given in Eq. ([231). 

A support region is centred on every atom, and each such region contains four support 
functions. One can imagine that these support functions correspond roughly to the single 3s 
function and the three 3p functions that would be used in a tight-binding description, but 
we stress that nothing obliges us to work with this number of support functions. In keeping 
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with the tight-binding picture, the initial guess for the support functions is taken to be a 
Gaussian multiplied by a constant, x, y or z, so that the functions have the symmetry of s 
and p states. As an initial guess for the L-matrix, we take the quantity 21 — S, where S 
is the overlap matrix calculated for the initial support functions. This guess for L, which 
represents the expansion of S^ 1 = (I — (I — S))^ 1 to first order, is crude, and does not 
yield the correct value of Tr p. This error is corrected by displacing L iteratively along the 
gradient ViN e until N e is within a required tolerance of the correct value. 

The initial guesses for the <p a and the L a p define the initial Hamiltonian and overlap 
matrices. From this starting point, we make a number of conjugate-gradient line searches 
to minimize Q by varying L, with the Hamiltonian and overlap matrices held fixed. This is 
followed by a sequence of line searches in which the <p a are varied. We refer to the sequence 
of L moves followed by a sequence of moves as a cycle. The entire energy minimization 
consists of a set of cycles. In practice, we have found that cycles consisting of five L moves 
and two moves work satisfactorily, and that E tot is converged to within 10 -4 eV/atom 
after typically 50-60 cycles. This would not be an efficient rate of convergence for routine 
applications, but is more than adequate for present purposes. 



Our test calculations confirm our earlier finding |T3j that for the Si perfect crystal E tot 
is already quite close to its exact value when = 5.0 A. We have therefore used this value 
of i?L to make calculations of E tot as a function of -R reg (see fig. la). The results show that 
-E to t converges very quickly with increasing -R reg , and that it is within ~ 0.1 eV of its fully 
converged value for R reg = 3.05 A. In order to show how -Etot depends on R^, we present a 
series of results at the two region radii -R reg = 2.04 and 2.38 A (see fig. lb). These results 
indicate that there is only a slow variation with R^ and that this variation is almost the 
same for different values of -R reg . This means that it is possible to converge the total energy 
to satisfactory accuracy with easily manageable spatial cutoffs. 

It is interesting to know the form of the support functions for the self-consistent ground 
state. These are shown in fig. 2 for the case -R reg = 3.05, R^ = 5 A. The support functions 
shown here are the first (initially s Gaussian) and second (p x Gaussian). Profiles of the 
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support functions along the [100], [110] and [111] directions are shown. The support functions 
are seen to be symmetric with respect to the center of the support region (r = 0) along the 
[100] and [110] directions. Along the [111] direction there is a slight asymmetry resulting 
from the presence of a nearest neighbour ion, which lies at 2.35 A from the origin in the 
positive direction. Remarkably, the s-like support function seems to be almost perfectly 
spherically symmetric, except near the peak at r « 0.8 A. It is encouraging to see that the 
support functions go rather smoothly to zero at the region boundary, and this confirms that 
the boundary is having little effect on the results. 



V. DISCUSSION 

We have tried to do three things in this work: to develop the basic formalism needed to 
underpin O(N) DFT-pseudopotential methods; to implement one such method and identify 
the main technical issues in doing so; and to present the results of tests on a simple but 
important system, which allow us to gauge the usefulness of the method. We have shown that 
a rather general class of O(N) DFT-pseudopotential methods can be based on a formulation 
of DFT in terms of the density matrix, and that this formulation is equivalent to commonly 
used versions of DFT that operate with fractional occupation numbers. From this viewpoint, 
the key challenge is to ensure that the eigenvalues of the variable density matrix lie between 
and 1, and we have seen that the method of LNV ]nj gives a way of doing this. The 
implementation of the basic ideas has been achieved by performing all calculations in real 
space, with the DFT integrals approximated by sums on a grid - except for the use of FFT 
to treat the Hartree term. An alternative here would be to work with atomic-like basis 
functions, but we note that the use of a grid preserves an important link with conventional 
plane-wave methods, as will be analysed in more detail elsewhere. Our test results on 
perfect-crystal Si show that the total energy converges rapidly as the real-space cutoffs are 
increased, and that it is straightforward to achieve a precision comparable with that of 
normal plane-wave calculations. 
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An important question for any O(N) method is the system size at which it starts to 
beat a standard 0(N 3 ) method - a plane- wave method in the present case. This will clearly 
depend strongly on the system, but even for Si it is too soon to answer it on the basis of 
practical calculations. The cross-over point depends on the prefactor in the linear scaling, 
and this is strongly affected by the efficiency of the coding. All we have attempted to do 
here is to address the problem of achieving O(N) behavior. The question of the prefactor is 
a separate matter, which will need separate investigation. 

It should be clear that there is much more to do before the present methods can be 
routinely applied to real problems. We have deliberately not discussed in detail the problems 
of doing calculations on a real-space grid. Such problems have been discussed outside the 
linear-scaling context in several recent papers [j22| , [23f , and it should be possible to apply the 
advances reported there to O(N) DFT calculations. In particular, curvilinear grids f24]]25|| 
for the treatment of strongly attractive pseudopotentials are likely to be very important for 
O(N) calculations. We have also not discussed here the calculation of forces on the atoms, 
the problems that may arise when the boundaries of support regions cross grid points, and 
the general question of translational invariance within grid-based techniques. 

We have noted already that our method is related to other recently proposed methods. As 
shown in sec. 2, the Mauri- Galli- Car scheme is obtained from ours by taking an alternative 
polynomial expression for the density matrix p in terms of the auxiliary matrix a. It would 
clearly be interesting to repeat the calculations done here using this approach. In a sense, this 
is what has already been done by Hierse and Stechel [0 , except that instead of performing 
calculations on a grid, they use a minimal atomic basis set. The hydrocarbon systems used 
by them for test purposes are also rather different from the Si crystal we have studied. 

Finally, we note that our linear-scaling scheme is intended for calculations on very large 
systems, and this means that parallel implementation will play a key role. The test calcu- 
lations we have presented were, in fact, performed on a massively parallel machine, and the 
parallel-coding techniques we have developed will be described in a separate paper. 
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APPENDIX A: DERIVATIVES OF THE TOTAL ENERGY 

We derive here expressions for the derivatives dE tot / dL a p and SE tot /S(f) a (r). 



written as the trace of some operator acting on the density matrix, as is the case for the 
kinetic and pseudopotential energies (see Eq. (|^)), and those that depend only on the diag- 
onal elements of p, i.e. the electron density, namely the Hartree and exchange- correlation 
energies. The Madelung term, E M , does not depend on either L a p or (fi a , so it will make 
no contribution to the variation in total energy as these are changed. Denoting by E c the 
kinetic or pseudopotential contribution to the energy, we have: 



1. Derivative with respect to L a p 



In DFT, the total energy Eq. (Q) has two types of contributions: those that can be 



E c = 2^ [3(LSL) 7<5 - 2(LSLSL) 7S ] C h , 



(Al) 



where 




(A2) 



for the kinetic energy, and for the pseudopotential 




(A3) 
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where V ps is in general a non-local pseudopotential operator. Clearly, C y s does not depend 
on L a/3 for either operator, so this term does not change as L a/3 is varied. It is thus easy to 
see that 

dE 

— — = [6(SLC + CLS) Pa - A{SLSLC + SLCLS + CLSLS) Pa ) , (A4) 

where C is the matrix representation of the corresponding operator (kinetic energy or pseu- 
dopotential) in the basis of the support functions. 

For the Hartree and exchange-correlation contributions, denoted by E v , we have that 

= / dr i^^W (A5) 

dL al 3 J 8n(r) dL aj3 
The electron density n(r) is simply 2p(r, r), so that: 

= 2$> 7 (r)-^- [3(LSL), S - 2(LSLSL), S ] &(r). (A6) 

In the case of the Hartree contribution, we have 

T - rr = e 2 / dr , 1 ; = $ r , A7 
dn(r) J | r — r | 

where 3>(r) is the Hartree potential, while in the case of the exchange- correlation contribution 
we have 

^4 Wn)1= "- (r) ' (A8) 
where // xc is the exchange-correlation potential. If we take V(r) to represent either $(r) or 
/x xc (r) as the case may be, we see that expression (|A~5|) reduces to 

where 



[6(SLK + VLS) Pa - A(SLSLV + SLVLS + VLSLS)^] , (A9) 



Vap = Jdr (f> a (r)V(r)(f>p(T). (A10) 

By comparing this expression with Eq. (|A3j), it is easy to see that the partial derivative of 
the total energy with respect to L a p can be written more compactly as 
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— — = [6{SLH + HLS)p a - 4{SLSLH + SLHLS + HLSLS)p a ] , (All) 

where H a/ 3 is the sum of the corresponding matrix elements of the kinetic, pseudopotential, 
Hartree and exchange-correlation operators, i.e. the matrix representation of the Kohn- 
Sham Hamiltonian in the basis of the support functions. Recall that in practice, we do not 
vary E tot but rather vary Q = E tot — fiN with respect to L a p. However, it is trivial to 
obtain dVL/dL a p from Eq. (|A11 ) by simply substituting the matrix elements of H by those 



of H — fiS. Once this is done, Eq. (All) corresponds to Eq. (j34|). 



2. Functional derivative of E tot with respect to <j) a 

According to Eq. (|TTD, the density matrix, and hence the total energy, can be regarded 
as a function of the quantities (f) a and K a p. When <ft a is varied, E to t therefore varies firstly 
because of its direct dependence on <p a , and secondly because of the implicit dependence 
of the K a p matrix on <p a through its dependence on the overlap matrix elements S a p (see 
Eq. (|IoD); we call these two types of variation type 1 and type 2. 

To see how variations of type 1 behave, consider first the kinetic and pseudopotential 
energies. The type 1 variation of either of these is given by: 

{5E C ) X = 2Y,K 5l 5j dv 7 (r)CV> 5 (r) , (A12) 

where C represents the kinetic energy or the pseudopotential operator. The variation of the 
integral gives 

{5E C ) X = 2j2K 5l Jdr (SfaCfa + (pyCSfa) 

■yS 

= 2 J2 K s-y J dr ((50 7 (70 5 + 5<f> s C<f>J) . (A13) 

The last equality follows from the fact that C is a Hermitian operator. The type 1 variation 
of E c can therefore be expressed as: 

(^,) i =4E^(60ri(r), (A14) 
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where {C(j>p)(T) represents the action of the operator C on (pp evaluated at the point r. 

Now consider the type 2 variation of E c due to the variation of the overlap matrix 
elements. The variation of S a p is 

SSap = J dr (0^(50,3 + 5(j) a (t)p) , (A15) 

and the type 2 variation of E c is then obtained by applying this expression to Eq. (|A1|) . 
After a little manipulation, one obtains: 

T) 1 

p 



-) = 12 Y,{LCL) ap( j>p{v) - 8 J2(LSLCL + LCLSL) af3 ^(r) . (A16) 

V 2 8 3 



Here, C is the matrix whose elements are: 

C aP = J dv(j> a C(f>p. (A17) 

Combining Eqs. ( |A14j ) and ( |A16| ), we obtain the following expression for the total variations 
of the kinetic and pseudopotential energies: 

5E r 

3 



4 E [ K *pC + 3(LCL) a/3 - 2(LSLCL + LCLSL) a/3 ] <f> p (r). (A18) 



For the remaining terms (Hartree and exchange-correlation), variation in the energy 
results from variation in the electron density. Thus we need to calculate 

Again, we will have variations coming directly from the change in a (r) and variations 
coming indirectly from changes in the overlap matrix elements. The total variation of n(r') 
will be 

^p- = 25(r - r') £[<M r )^ + K<#M*)\ 

+ 2E^(r> 7 (r')^[3(L5L)^ - 2{LSLSL) Pl ] . (A20) 
Substituting this expression into 
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SK - fdr'V{v') 5 ^l, (A21) 



where the quantity V(r) = SE V /Sn(r') represents the Hartree or exchange- correlation po- 
tential, we find, after some manipulation: 

5E 

—f- = AY J [K aP V{v) + 3(LVL) a p - 2(LSLVL + LVLSL)^}^) • (A22) 

Combining this expression for the Hartree and exchange-correlation derivatives with 
Eq. ( |A18| ) for the kinetic and pseudopotential derivatives, we find: 

— = 4 Y\K a pH + 3(LHL) a p - 2(LSLHL + LHLSL) aP ](j>p{v) , (A23) 

where H is the Kohn-Sham operator, and H is its matrix representation in the basis of 
support functions. 

Note that in the practical grid-based calculations, the derivative we actually want is 
dE tot /d(f> a (ri), which describes the variation of E tot with respect to change of <p Q at the grid 
point re- The formula for dE tot / d(p a {jC{) is identical to Eq. ( |A23|) except that we need to 
multiply by the volume per grid point 8u. 
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FIGURES 

FIG. 1. (a) Total energy per atom as a function of the support region radius R reg with Rl = 
5 A. (b) Total energy per atom as a function of the range of the L matrix, Rl, for two different 
support region radii, R re g = 2.04 and 2.38 A. 

FIG. 2. Support functions after minimisation of the total energy with R reg = 3.05 and 
Rl = 5.0 A. (a) s-like support function, and (b) p^-like support function. 
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